Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

End-to-end speech emotion recognition based on multi-head attention

Lei YANG, Hongdong ZHAO, Kuaikuai YU

Journal of Computer Applications 2022, 42 (6): 1869-1875. DOI: 10.11772/j.issn.1001-9081.2021040578

Abstract （321）

HTML （12）

PDF （2133KB）（154）

Save

Aiming at the characteristics of small size and high data dimensionality of speech emotion datasets， to solve the problem of long-range dependence disappearance in traditional Recurrent Neural Network （RNN） and insufficient excavation of potential relationship between frames within the input sequence because of focus on local information of Convolutional Neural Network （CNN）， a new neural network MAH-SVM based on Multi-Head Attention （MHA） and Support Vector Machine （SVM） was proposed for Speech Emotion Recognition （SER）. First， the original audio data were input into the MHA network to train the parameters of MHA and obtain the classification results of MHA. Then， the same original audio data were input into the pre-trained MHA again for feature extraction. Finally， these obtained features were fed into SVM after the fully connected layer to obtain classification results of MHA-SVM. After fully evaluating the effect of the heads and layers in the MHA module on the experimental results， it was found that MHA-SVM achieved the highest recognition accuracy of 69.6% on IEMOCAP dataset. Experimental results indicate that the end-to-end model based on MHA mechanism is more suitable for SER tasks compared with models based on RNN and CNN.

Table and Figures | Reference | Related Articles | Metrics